#https://datatables.net/reference/option/
options(DT.options = list(scrollX = TRUE, pagin=TRUE, fixedHeader = TRUE, searchHighlight = TRUE))
BASICS
Chapter 4: The Ames housing data
data('ames');a = ames
a = a %>%
clean_names() %>%
select(sort(tidyselect::peek_vars())) %>%
select(
where(is.Date),
where(is.character),
where(is.factor),
where(is.numeric)
)
a %>% head %>% DT::datatable()
4.1 exploring important features
distribution of outcome var, sale price
a %>% plot_ly(x = ~sale_price) %>% add_boxplot()
## Warning: `arrange_()` is deprecated as of dplyr 0.7.0.
## Please use `arrange()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
a %>% plot_ly(x = ~sale_price) %>% add_histogram()
1. right skewed due to many high outliers; we should log transform 2. Note: log transforming decrease the interpretability of the data
ggplotly(a %>% ggplot(aes(sale_price)) + geom_histogram(bins = 50) + scale_x_log10())
a$sale_price = log10(a$sale_price)
transforming the output var will probably result in better models than using the untransformed data.
The downside to transforming the outcome is mostly related to interpretation.
Chapter 5: Spending(Splitting) our data
5.1 common methods for splitting data
set.seed(123)
(split = a %>% initial_split(prob = 0.8, strata = sale_price))
## <Analysis/Assess/Total>
## <2199/731/2930>
train = training(split)
test = testing(split)
Chapter 6: Feature engineering with recipes
Chapter 7: Fitting models with parsnip
Chapter 8: A model workflow
Chapter 9: Judging model effectiveness
TOOLS: FOR CREATING EFFECTIVE MODELS
Chapter 10: Resampling for evaluating performance
Chapter 11: Comparing models with resampling
Chapter 12: Model tuning and the dangers of overfitting
Chapter 13: Grid search
Chapter 14: Iterative search
Chapter 15: Explaining models and predictions